Gap 3 Gap 4 Gap 5 Gap 6 Gap 7 Dec
نویسندگان
چکیده
Background: Although the human genome sequence was declared complete in 2004, the sequence was interrupted by 341 gaps of which 308 lay in an estimated approximately 28 Mb of euchromatin. While these gaps constitute only approximately 1% of the sequence, knowledge of the full complement of human genes and regulatory elements is incomplete without their sequences. Results: We have used a combination of conventional chromosome walking (aided by the availability of end sequences) in fosmid and bacterial artificial chromosome (BAC) libraries, whole chromosome shotgun sequencing, comparative genome analysis and long PCR to finish 8 of the 11 gaps in the initial chromosome 22 sequence. In addition, we have patched four regions of the initial sequence where the original clones were found to be deleted, or contained a deletion allele of a known gene, with a further 126 kb of new sequence. Over 1.018 Mb of new sequence has been generated to extend into and close the gaps, and we have annotated 16 new or extended gene structures and one pseudogene. Conclusion: Thus, we have made significant progress to completing the sequence of the euchromatic regions of human chromosome 22 using a combination of detailed approaches. Our experience suggests that substantial work remains to close the outstanding gaps in the human genome sequence. Background The completion of the human genome sequence was the culmination of the 15 year Human Genome Project. The finished sequence contained 2.85 Gb and was estimated to cover 99% of the euchromatin [1]. Thus far the human genome is the only gigabase scale sequence to obtain the necessary high accuracy and near completeness to be published as a 'finished' standard, although the mouse genome is expected to join it soon. However, although significant efforts were made to obtain maximum continuity, the sequence was interrupted Published: 13 May 2008 Genome Biology 2008, 9:R78 (doi:10.1186/gb-2008-9-5-r78) Received: 19 February 2008 Revised: 10 April 2008 Accepted: 13 May 2008 The electronic version of this article is the complete one and can be found online at http://genomebiology.com/2008/9/5/R78 Genome Biology 2008, 9:R78 http://genomebiology.com/2008/9/5/R78 Genome Biology 2008, Volume 9, Issue 5, Article R78 Cole et al. R78.2 by 341 gaps. Of these, 308 gaps covered approximately 28 Mb of euchromatin while the remainder represented the heterochromatin, chiefly centromeres and telomeres. While finishing of the sequence was a major milestone, for completists there remain the nagging questions of whether it is possible to close the gaps, and what lies in those missing sequences. The process of sequencing the human genome was undertaken using the two approaches of whole genome shotgun [2] and map based clone sequencing [3]. However, only the clone-based strategy, which utilized genome maps and large insert clones, allowed ready application of directed strategies for completion of the sequence [1]. The clone-based strategy involved building contiguous maps of the human chromosomes in large-insert cloning vectors such as bacterial artificial chromosomes (BACs), resolved at a local level by restriction enzyme fingerprinting and ordered and orientated with respect to longer range maps of the genome [4,5]. Individual BACs were then selected from the maps to create a set of clones that minimally covered the genome for sequencing. In the first instance the tilepath BACs were subjected to shotgun sequencing and assembled to produce the draft quality genome sequence. Progressing from this point to a complete sequence by the process of finishing required two major components: first, the maps of clones required completing so that substrates were available for sequencing; and second, the sequence within each clone required refining to the highest level of accuracy with no gaps. Thus, gaps in the genome sequence could be of three kinds. There could be gaps within individual clone sequences where either sequence could not be determined, or there was ambiguity or error in the base call (sequence gaps/errors) [6]. There could be gaps where no clone was available from the map for sequencing, including, but not restricted to, heterochromatic and segmental duplicated regions (map gaps) [7]. The third type of gap would result from a problem with the shotgun assembly or with the underlying BAC, such as a deletion resulting in a false join within the sequence (assembly or insertion/deletion errors) [6,8]. Quality assessments of the finished human genome sequence suggested that sequence gaps/errors were likely to occur at a rate an order of magnitude lower than the rate of human polymorphism (< 1/10 kb), while mis-assembly or insertion/deletion errors were likely to be relatively few [1,6,9], although the precise number remained to be established at all resolutions. In addition, because of the local nature of the sequence assembly for each clone in the clonebased sequence strategy, sequence or mis-assembly gaps were unlikely to affect substantial regions. On the other hand, the number of map gaps was well established and the missing sequence at each gap was known to be on the order of 90 kb on average. Therefore, to obtain a complete reference human genome sequence requires identification and sequencing of new clones for map gaps and finding and addressing each base ambiguity or error. This would entail a substantial genome curation activity designed to improve the coverage and accuracy of the sequence. In addition, the current reference sequence is a mosaic of clone sequences derived from more than eight individuals. For genes it would be desirable that the allele in the reference sequence is as far as possible representative of the functional form. For instance, the initial chromosome 22 sequence contained a deletion allele of the CYP2D6 gene [10]. Although this form is reasonably common in European populations, it would be preferable to have a complete version as the reference. Furthermore, in certain regions where there is extensive polymorphism, such as the human leukocyte antigen (HLA) locus, there are arguments for maintaining alternative versions of the common haplotype sequences [11]. Chromosome 22 was the first human chromosome to reach finished sequence standard [8]. On initial publication the sequence comprised 12 contiguous segments spanning 33.4 Mb of 22q (Figure 1) and included known centromeric and subtelomeric heterochromatin repeats at either end. Four of the map gaps were located in 22q11 in regions associated with the segmental duplications involved in low copy repeats (LCRs) on chromosome 22 (LCR22; here referred to as gaps A-D; Figure 1) [8,12-14]. The remaining seven gaps were located in the G+C rich region of 22q13.3 (gaps 1-7; Figure 1) and are not obviously associated with copy number variations (CNVs) in the latest CNV data [15], although CNVs occurring in the gaps would not have been detected. Since the initial publication we have been working towards closing these gaps, particularly in the 22q13.3 region that was the responsibility of our group in the original chromosome 22 sequencing consortium. Here we report our approaches and progress towards completion of the human chromosome 22 sequence. Our experiences may be pertinent to future efforts to curate the human genome reference sequence. Results and discussion In the following sections we describe our approaches and results towards correction of deletions and closing map gaps on human chromosome 22. The clone library resources used and the information required to decode clone prefixes are provided in Additional data file 1. For reference we have detailed the positions of the gaps and deletions to which we refer on selected genome builds in Additional data files 2 and 3. Updating the chromosome 22 sequence to correct deletion alleles and deleted BAC clones The initial chromosome 22 sequence included a P1 artificial chromosome (PAC) (RP1-257I20, AL021878) containing a common deletion allele of the CYP2D6 gene [10]. In order to represent this gene in a functional form in the reference sequence, we identified from the clone map a RPCI-4 PAC containing a full copy of CYP2D6 (RP4-669P10, BX247885; Genome Biology 2008, 9:R78 http://genomebiology.com/2008/9/5/R78 Genome Biology 2008, Volume 9, Issue 5, Article R78 Cole et al. R78.3 see Additional data file 1 for details of clone libraries used). This PAC was sequenced and this haplotype, constituting an additional approximately 12 kb of sequence compared to the initial version, was incorporated into the reference sequence
منابع مشابه
An analytical on the gap between theory and action in Allamah Tabatabaei
The gap between theory and ethical action is known an important problem in ethics and education. Analyzing the relationship between theory and ethical action and deciding the elements having to do with optional actions of human and their interaction, one may perceive the reasons and roots of the gap and, on this basis, improve educational theories to attain the harmony between theory and action...
متن کاملThe Quality of Educational Services: Gap between Optimal and Actual Status according to Dentistry Students
Introduction: Students as main customers of educational services judge service quality by making a comparison between their expected services and the services that they receive. Differences between expectations and perceived performance are referred to as ‘quality gap’. Regarding educational services, this gap is related to managers’ failure to recognize and respond to the students’ needs. The ...
متن کاملInvestigations on Optoelectronic Properties of New low Gap Compounds Based on Pyrrole as Solar Cells Materials
In this paper theoretical study by using DFT method on three conjugated compound based on 2-styryl-5-phenylazo-pyrrole is reported. These dyes contain one carboxy, two carboxy and one sulfonic acid anchoring groups, the aim is to investigate their effects on the electronic structure. The theoretical knowledge of the HOMO and LUMO energy levels of the components is cannot be ignored in investiga...
متن کاملThe Character Tables of Centralizers in Weyl Group of E8 Ii
To classify the finite dimensional pointed Hopf algebras with Weyl group G of E8, we obtain the representatives of conjugacy classes of G and all character tables of centralizers of these representatives by means of software GAP. In this paper we only list character table 29–46. 2000 Mathematics Subject Classification: 16W30, 68M07 keywords: GAP, Hopf algebra, Weyl group, character. 0. Introduc...
متن کاملThe Character Tables of Centralizers in Weyl Group of E8 Iii
To classify the finite dimensional pointed Hopf algebras with Weyl group G of E8, we obtain the representatives of conjugacy classes of G and all character tables of centralizers of these representatives by means of software GAP. In this paper we only list character table 47–64. 2000 Mathematics Subject Classification: 16W30, 68M07 keywords: GAP, Hopf algebra, Weyl group, character. 0. Introduc...
متن کاملPredicting survival of patients with idiopathic pulmonary fibrosis using GAP score: a nationwide cohort study
BACKGROUND The clinical course of idiopathic pulmonary fibrosis (IPF) varies widely. Although the GAP model is useful for predicting mortality, survivals have not yet been validated for each GAP score. We aimed to elucidate how prognosis is related to GAP score and GAP stage in IPF patients. METHODS The Korean Interstitial Lung Disease Study Group conducted a national survey to evaluate vario...
متن کامل